Determining the Unithood of Word Sequences Using a Probabilistic Approach

نویسندگان

  • Wilson Wong
  • Wei Liu
  • Mohammed Bennamoun
چکیده

Most research related to unithood were conducted as part of a larger effort for the determination of termhood. Consequently, novelties are rare in this small sub-field of term extraction. In addition, existing work were mostly empirically motivated and derived. We propose a new probabilistically-derived measure, independent of any influences of termhood, that provides dedicated measures to gather linguistic evidence from parsed text and statistical evidence from Google search engine for the measurement of unithood. Our comparative study using 1, 825 test cases against an existing empiricallyderived function revealed an improvement in terms of precision, recall and accuracy.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Determining the Unithood of Word Sequences using Mutual Information and Independence Measure

Most works related to unithood were conducted as part of a larger effort for the determination of termhood. Consequently, the number of independent research that study the notion of unithood and produce dedicated techniques for measuring unithood is extremely small. We propose a new approach, independent of any influences of termhood, that provides dedicated measures to gather linguistic eviden...

متن کامل

Multi-granulation fuzzy probabilistic rough sets and their corresponding three-way decisions over two universes

This article introduces a general framework of multi-granulation fuzzy probabilistic roughsets (MG-FPRSs) models in multi-granulation fuzzy probabilistic approximation space over twouniverses. Four types of MG-FPRSs are established, by the four different conditional probabilitiesof fuzzy event. For different constraints on parameters, we obtain four kinds of each type MG-FPRSs...

متن کامل

ارائه یک مدل احتمالاتی جهت تعیین انسجام متن در سیستم های پرسش و پاسخ تعاملی

Evaluation plays an important role in interactive question answering systems like many computational linguistics fields. The coherence between the questions and the answers exchanged between the user and the system is one of the important criteria in evaluating these systems. In this paper, a new approach to determine the degree of coherence of generated text by the IQA systems is presented. Th...

متن کامل

Using it Bundles in Published and Unpublished Writings

Lexical bundles are known as important elements of coherent discourse that have been the subject of much research. While the previous research has been mainly concerned with exploring variations in the use of these word sequences across different registers and disciplines, very few studies have addressed the use of some particular groups of lexical bundles within some types of academic writing....

متن کامل

A Study on Terminology Extraction Based on Classified Corpora

Algorithms for automatic term extraction in a specific domain should consider at least two issues, namely Unithood and Termhood(Kageura,1996). Unithood refers to the degree of a string to occur as a word or a phrase. Termhood (Chen Yirong, 2005) refers to the degree of a word or a phrase to occur as a domain specific concept. Unlike unithood, study on termhood is not yet widely reported. In cla...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008